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In this paper, we propose a new semiparametric regression estimator by using a hybrid 
technique of a parametric approach and a nonparametric penalized spline method. The overall 
shape of the true regression function is captured by the parametric part, while its residual is 
consistently estimated by the nonparametric part. Asymptotic theory for the proposed semi- 
parametric estimator is developed, showing that its behavior is dependent on the asymptotics 
for the nonparametric penalized spline estimator as well as on the discrepancy between the 
true regression function and the parametric part. As a naturally associated application of 
asymptotics, some criteria for the selection of parametric models are addressed. Numerical 
experiments show that the proposed estimator performs better than the existing kernel-based 
semiparametric estimator and the fully nonparametric estimator, and that the proposed cri- 
teria work well for choosing a reasonable parametric model. 
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1 Introduction 

There have been several nonparametric smoothing techniques used in regression problems, such 
as lowess, kernel smoothing, spline smoothing, wavelet, the series method, and so on. The 
nonparametric estimators generally have consistency, which is an advantage of this approach. 
Hence, if the nonparametric estimator is used, we can expect that the true regression can be 
captured as the sample size increases. However, because the form of a nonparametric estimator 
is sometimes complicated, the interpretation of the estimated structure might not be clear. 

On the other hand, in a parametric regression problem with the true regression function con- 
trolled by a finite-dimensional parameter vector, the estimated structure is easy to understand, 
however, the estimator does not have consistency. Therefore, there are advantages and disad- 
vantages associated with each of these approaches. This motivates us to consider a hybrid of 
parametric and nonparametric methods for the regression problem and we, in fact, introduce a 
semiparametric regression method so that the estimator has the advantages of both approaches. 

The semiparametric method in this paper consists of two steps. In the first step, we utilize 
an appropriate parametric estimator. In the second step, we apply a certain nonparametric 
smoother to the residual data associated with the parametric estimator in the first step. The 
parametric estimator in the first step and the nonparametric smoother in the second step are 
combined into the proposed semiparametric estimator. 

Similar semiparametric approaches for smoothing have been developed by many authors. 
Hjort and Glad (1995) and Naito (2004) discussed similar methods in density estimation litera- 
ture. Glad (1998) and Naito (2002) addressed the semiparametric regression method. Martins 
et al. (2008) introduced general decomposition, including additive and multiplicative correc- 
tions in regression. Recently, Fan et al. (2009) discussed the semiparametric approach in the 
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framework of a generalized linear model. Note that the aforementioned works all used kernel 
smoothing in the second step estimation. 

Our proposal is to utilize the penalized spline method for residual smoothing in the second 
step. This is a typical technique used in nonparametric regression problems with sufficient 
fitness and appropriate smoothness, which was developed by O'Sullivan (1986) and Eilers and 
Marx (1996). Many of its applications are summarized in Ruppert, et al (2003). Throughout 
this paper, the fully nonparametric penalized spline estimator is designated by NPSE, while 
the semiparametric penalized spline estimator, including the two-step manipulations mentioned 
above, is denoted by SPSE. In this paper, the advantages of using the penalized spline method 
instead of the kernel method are described both theoretically and numerically. In particular, we 
found that the SPSE has better behavior than the semiparametric local linear estimator (SLLE) 
in simulation. 

This paper is organized as follows. We elaborate on the proposed SPSE in Section 2. Section 
3 discusses the asymptotic properties of the SPSE, which can be obtained using a combination 
of the asymptotic results for the parametric estimator and for the NPSE developed by Claeskens 
et al. (2009). The asymptotic bias of the SPSE depends on the initial parametric model utilized 
in the first step. The form of the asymptotic bias suggests a method of choosing the parametric 
model for the first step. A theoretical comparison of SPSE with SLLE is also given in the 
context of asymptotic bias, which reveals that the use of the penalized spline rather than a 
kernel smoother in the second step is valid. In Section 4, some criteria for parametric model 
selection will be clarified. If a parametric model chosen by the criteria discussed in Section 4 is 
used as the parametric part of the SPSE, its asymptotic bias will become smaller than that of 
the NPSE. The results of a simulation are reported in Section 5. The simulation studies include 
checking the accuracy of the SPSE and comparing it with the NPSE and the SLLE as regression 
estimators. The performance of the parametric model selection discussed in Section 4 is also 
investigated. Related discussion and issues for future research are provided in Section 6. Proofs 
for the theoretical results are given in the Appendix. 

2 Semiparametric penalized spline estimator 

Consider the relationship of the dataset {(xi, j/j) : i = 1, • • • , n} as the regression model 

Ui = f{xi) +Ei, i = 1, ■•• ,n, 

where the explanatory Xi is generated from density q(x) with its support on [0,1], f(x) = 
E\Y\X = x] is an unknown regression function, and the errors £j are assumed to be uncorrelated 
with E[ei\Xi = x^ = and V[ei\Xi = x { ] = a 2 ( Xi ) < oo. Let f(x\/3),(3 £ B C R M be a 
parametric model. We now construct the semiparametric estimator of f(x). First we obtain an 
appropriate estimator (3 of (3 via a suitable method of estimation. Then f{x) can be written as 

f(x) = f(x\P) + f(x\pyr 7 (x,P), (1) 

where r 7 (x,/3) = {/(x) — f(x\P)}/f(x\f3)' y for some 7 G {0, 1}. When 7 = 0, this decomposition 
becomes f(x) = f(x\f3) + {f(x) — f(x\0)}, which is called an additive correction. When 7 = 1, 
on the other hand, we have a multiplicative correction f(x) = f (x\(3){f (x) / f (x\/3)} . By using 
the parameter 7, we can treat additive and multiplicative corrections systematically (see, Fan 
et al. (2009)). In the second step, r 1 (x,(3) is estimated by applying a nonparametric technique 
to {(xi, {yi — f (xi\(3)} / f (xilP) 1 ) :« = !,••• , n}. The SPSE is obtained as 



f(x,j) = f(x\p) + f(x\pyr y (x,p), 



(2) 
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Figure 1: Plots for one random sample of true f{x) (dashed) and the parametric estimator 
f{x\(3) (solid) in the left panel, the residuals and the penalized spline estimator of fo(x,(3) 
(solid) in the middle panel, and the true f{x) (dashed) and the SPSE f(x, 0) (solid) in the right 
panel. 

where r 7 (x,/3) is a nonparametric estimator of r 7 (cc,/3). 

We adopt the penalized spline to estimate r 7 (x,/3). Let {B^ p p+1 (x), ■ ■ ■ , B^ n (x)} be a 
marginal i?-spline basis of degree p with equally spaced knots = k/K n (k = —p+1, • • • , K n +p). 
Then we consider the £>-spline model 

k=-p+l 

as an approximation to r^(x, j3), where b^s are unknown parameters. The definition and funda- 
mental properties of the S-spline basis are detailed in de Boor (2001). Let i? 7 be the ra-vector 
with ith element {yi - f(x i \P)}/f(x i \P)' y and let Z = {B^ p+j (xi))ij and b = {b- p+1 ■ ■ ■ b Kn )' '■ 
The penalized spline estimator b = (6_ p+ i • • • &/<„)' of b is defined as the minimizer of 

(Ry - Zb)'(Ry - Zb) + X n b'Q m b, 

where A n is the smoothing parameter and Q m is the mth difference matrix. The estimator of 
rj(x,p) is defined as 

K n 

r 7 (x,P)= B [ v\x)b k = B{x)'{Z'Z + \ n Q m r 1 Z'R 11 (3) 

k=-p+l 

where B{x) = {B% +1 (x) ■■■ B [ *\{x))' . 

In Figure 1, an example of the SPSE is drawn. In the left panel, the true function f(x) = 
exp[— x 2 ] sin(27rx) and the least square estimator f(x\(3) of f(x\(3) = Po + f3\x + fax 2 + fox 3 
are shown. In the middle panel, the residuals of f{x\(3) and the penalized spline estimator of 
ro(x, (3) are drawn. In the right panel, the true function and the SPSE as given in ([2]) are drawn. 
As the interpretation of f{x) for this example, the parametric part captures the overall shape 
of f(x) and the nonparametric part explains details which could not be captured by the f{x\(3). 
Similarly, we can construct an SPSE with multiplicative correction. 

3 Asymptotic Result 

Asymptotics for the NPSE were developed by Claeskens et al. (2009). By using their results, 
we show the asymptotic bias and variance, and asymptotic distribution of the SPSE. We now 
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give some assumptions regarding the asymptotics of the SPSE. 
Assumptions 

1. There exists a > such that a < f(x\(3) for all x G [0, 1], f3 G B. 
2 - ™p ze[0A] {q(z)} < oo. 

3. | df (x |/3) /a/3, | < oo, for x G [0, 1], /3 G B, t = 1, • • • , m. 



m. 



4. l&fixffl/dPidpjl < oo, for x G [0, 1], (3eB,i,j = 1, 

5. Icf/^)/^ 1 ! < oo, for £ G [0, 1], i = 1, ■ ■ ■ ,p + 1. 

6. K n = c^n 1 / 2 ) and A n = ofnif" 1 ). 

Define the (K n + p) x (K n + p) matrix G(q) = (gij)ij, where 

9ij= f BW +i (u)BW +j (u)q(u)du 

J 

and the (K n + p) x (K n + p) matrix G(a,(3,j,q) = (ga,ij)ij, where 

o - C B [p] (u)B [p] (u) a2{u)q{u) du 



Let fo*(/3, 7) be a best L ra approximation to (/(x) — /(x|/3))//(x|/3) 7 . This means that b*(/3,7) 
satisfies 

«(^n (P+1) ), 



sup 

xG(0,l) 



(3) - + b al (x\f3, 1 )-B{x)'b*((3, 1 ) 



f{x\p)i 
where 

= " {^wPV W^y.p { ^ s 1 < ^ ("^) ' 

/(a < x < 6) is the indicator function of the interval (a,b) and B p (x) is the pth Bernoulli 
polynomial. 

We now discuss a condition of the parametric estimator. Let F be the true distribution of 
(X, Y) and let F n be the corresponding empirical distribution. The estimator of (3 is defined 
as the functional form (3 = T(F n ), where T(-) is a real valued function defined on the set of 
all distributions. We can then see that lim n _ s . 00 /3 — > f3 , where /3 Q = T(F) is defined as the 
optimizer of some distance measure p. We assume that f(x\(3 ) is the best approximation of 
f(x). By the definition of 0, (3 — f3 Q can be expressed as 

1 n d 

^-/3o = -E / ( X ^) + -+<^ ( 4 ) 
i=\ 

where I(Xi,Yi) is the influence function defined as 
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with E[I(Xi,Yi)] = and finite covariance matrix, the delta function 5(X,Y) has probability 
1 at a point (X, Y), and d is the bias of f3. The remaining term 5 n has mean 0(n~ 2 ) for each 
component. 

We investigate the asymptotic property of f(x,"f) by a two-step procedure for clarity. First 
we derive the asymptotic expectation and variance of fo(x,j) = f(x\/3 Q ) + /(x|/3 ) 7 r 7 (x, (3 ). 
Here, f 7 (x,/3 ) is the penalized spline smoother of r 7 (x,/3 ). Second, we show that the dif- 
ference between fix,')) and /o(#, 7) vanishes asymptotically. Since (3 is no longer stochastic, 
the asymptotic property of fo(x, , y) is dependent only on the nonparametric penalized spline 
estimator of r 7 (x,/3 ). Hence we obtain 

E[f (x n )\X n ] = /(x|/3 ) + /(x| / 3 )^[f 7 (x,/3 )], 
V[f Q {x n )\X n ] = /(x|/3 ) 2 Ty[f 7 (x,/3 )]. 

Here for a random variable U n , E[U n \X n ] and y[C/ n |JT n ] are the conditional expectation and 
variance of U n given {X\, • • • , X n ) = {x\, • • • , x n ). The asymptotic property of r 7 (x, (3 ) can be 
directly obtained by using Theorem 2 (a) of Claeskens et al. (2009). 

Proposition 1. Let f E C p+1 ,f{-\(3) E C p+1 . Then, under the Assumptions, for a fixed 
xE (0,1), 

E[f (x,i)\X n ] = f(x) + b a (x\f3 , 1 ) + b x (x\(3 , 1 )+op{K-^) + opiX n K n n- 1 ), 
~- /(x|/3 ° )27 g(x) / G(g)- 1 G(a i ft, 7, gJGfa)-^^) + o P (K n n~ l ), 



V[f (x, 7 )\X n 
where 



E-n (P+ I)' ~] \ K n J 

6 A (x|/3 , 7 ) = -^/(x| i 9 )^( a ;) , G( g )- 1 Q m 6*G3o,7)- 

We now give the asymptotic result for f(x, 7). By using Q, f(x\(3) and r 7 (x, /3) are expanded 
about /(x|/3 ) and f 7 (rr,/3 ), respectively. From the details of the proof in the Appendix, we 
find that the asymptotic expectation and variance of f(x, 7) are dominated by those of fo(x, 7) 
and we obtain the following theorem. 

Theorem 1. Let f E C p+1 , f(-\(3 Q ) E C p+1 . Then under the Assumptions, for a fixed x E (0, 1), 

E[/(s, 7 )|X n ] = f(x) + b a (x\(3 , 1 ) + b x (x\(3 , 1 ) 

+0 P (n- 1 ) + 0P (JC {P+1) ) + opiXnL^n- 1 ), 

V\f(x,7)\X n ] = /(x|/3o) 1 B(x)'G(q)- 1 G(a,^, 1 ,q)G(q)- 1 B(x) + op^n" 1 ), 

n 

where b a (x\/3 ,^) and bx(x\/3 ,')) are those given in Proposition^ 

Theorem Q] and Lyapunov's theorem yield the asymptotic distribution of the SPSE. 

Theorem 2. Suppose that £ , [|ej| 2+(5 |Xj = Xj\ < C for some 5 > 2 and the Assumptions are 
satisfied. Then, using K n = 0{n l K 2p+ ^) and X n = 0{n p ^ 2p+l ">) , 

f(x, 7) - f(x) - b a (x\/3 , 7) - b x (x\(3 , 7) d 



^V[fix, 7 )\X n ] 

where 6 a (x|/3 ,7) and b\(x\(3 ,j) are those given in Proposition^ 



N(0,1), 
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If A n = 0, we obtain the semiparametric regression spline estimator from ([2}. Thus, it is 
clear that the asymptotic result of the semiparametric regression spline is contained in Theorems 
[U and [2j These are obtained from one parametric model. If we choose a polynomial model as 
f(x\3), we obtain the following Corollary. 

Corollary 1. Let f q (x\3 q ){q < p) be the qth polynomial model. Then, under X n = and 7 = 0, 
or X n > and 7 = 0, using p = 1, Q2 and equidistant knots, the SPSE is the same as the NPSE. 

Remark 1 From Theorem [21 as the advanced analysis, we can construct the asymptotic 
pointwise confidence interval of f(x) by estimating the variance of the error. 

Remark 2 Theorems Q] and [2] can be applied for 7 G {0, 1}. When 7 = 0, the results become 
those for additive correction. When 7 = 1, b a (x\3 Q , 1) and the variance agrees with that of the 
estimator for multiplicative correction. In b\(x\P , 1), it is understood that b*(3 , 1) is a best 
approximation of f(x)/f(x\3 ) — 1. Therefore, b*(3 , 1) can be written as b*(@ , 1) = b* — 1, 
where b* is a best approximation of f(x)/f(x\3 ) and 1 is a (K n + p) vector with all 
components equal to 1. In conclusion, b\(x\3 , 1) can be written as 

b x (x\3 ,l) = -^fix^VB^yC^Qmb* 
because all components of Q m l have vanished. 

Remark 3 When f(x) = f(x\P ) is assumed, we obtain b a (x\P , 7) = and b\(x\3 , 7) = by 
choosing b*(7, 3 ) = as a best approximation of 0. For 7 = 1, in particular, b a (x\3 , 1) = 
and b\(x\P , 1) = both hold even in cases where f(x) = cf(x\/3 ) with any constant c / 0. 

Remark 4 If we use the local pth polynomial technique in the second step estimation, we 
obtain the asymptotic bias bi(x\(3 ) as 

-hP +L + J z p ^ H p {z)dz, p : odd, 

where h n is bandwidth and H p (z) is the pth order kernel function. If K~ l and h n are equal 
and p is odd, the difference between b a (x\(3 ) and bi(x\(3 ) is only 

< x < Kj)B p+1 ( X - jf 1 ) and [ z p+1 H p (z)dz. (5) 
V K n J Jr 

If we can calculate (0), we would be able to compare the bias of the SPSE with that of the 
semiparametric local polynomial kernel estimator. As an example, when p = 1, it is easy to 
show that B 2 (x) = x 2 — x + 1/6 < 1/5 for x 6 [0,1], while we have J* K z 2 Hc(z)dz = 1 for the 
Gaussian kernel Hq(z) and J^z 2 HE(z)dz = 1/5 for the Epanechnikov kernel He(z). Therefore 
b a {x\B Q ) is smaller than bt(x\8 Q ) in this situation, which reveals that the SPSE is superior than 
the SLLE. 

4 Parametric model selection 

In this section, we describe how to choose a parametric model. From Remark 3, if the true 
regression function satisfies / € {f(-\3)\3 £ B C the bias of the SPSE is reduced. Hence 
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we determine the initial parametric model in a bias reduction context. Specifically, our purpose 
is to choose a parametric model such that the asymptotic bias of the SPSE becomes smaller 
than that of the NPSE: 



\b a (x\f3 ,-y)\ < \b a (x)\ and \b x (x\f3 , j)\ < \b x (x)\, for all x 6 (0, 1), 



(6) 



where b a {x) and b\ (x) are the asymptotic biases of the NPSE. If f(x\(3) is constant, b a (x\/3 ,'y) 
and b\(x\(3 ,j) are equivalent to b a (x) and b x (x), respectively. When the same K n and A n are 
used in both the SPSE and the NPSE, ([6]) can be rewritten as L a (x,j) > and L x (x,~/) > 
for all x 6 (0, 1), where 



ia(x, 7 ) = |/ (P+1) (x 



moor 



(p+1) 



and 



L A (x, 7 ) = |S(x) / G(g)- 1 Q m &* f | - |/(x| / 3 )^ J B(x) / G(g)- 1 g m b*( / 3 , 7 )|, 



where b*j is a best Loo approximation to f(x). As a pilot estimator of / and its (p + l)th 
derivative, we can use the local polynomial estimator / with degree p + 2. Then the estimator 
of L a (x,7) and L\(x,j) can be obtained as 



L a (x, 1 ) = \f(P+ 1 \x)\ 



f(x\py 



fix) - f(x\(3) 
f{z\P)i 



(p+i) 



and by using empirical form, 



L A (x, 7 ) = |B(x) / A- 1 Q m (Z , Z)- 1 Z , /| - |/(x| J 3)TB(x) / A- 1 Q TO (Z , Z)- 1 Z'f 7 |, 

where / = (f(xi) ■■■ f(x n ))' and f 7 is an n-vector with ith component {/(xj)— /(xj|/3)}//(xi|/3) 7 . 
Here, we use the fact that 

A n /(x|^)^ J B(.x) / A- 1 Q m (Z / Z)^ 1 Z , f 7 = 6 A (x|/3 , 7 ) + 0p (A n A' n n- 1 ), 

which is detailed in the proof of Theorem 2 (a) of Claeskens et al. (2009). We choose one 
parametric model by relative evaluation. Let 

ConA(/(-|/3)) = #{*j G (0,l)|L a (^, 7 ) >0,L A (^, 7 ) >0,j = l,--- , j} , 

for a given parametric model /(-|/3) and some finite grid points {zj}{ on (0,1). Here, for a 
set A, #A is the cardinality of A. After preparing a class of candidate parametric models 
{fk = /fe('|/3jfc); A; = 1, • • • , AT}, we choose a parametric model satisfying 



f(x\0) = argmax{C anA (/(-|/3 A .))} . 

fk 



(7) 



In summary, for each parametric model fk, we calculate L a , L\ and C anA (/(-|/3fc))- By using the 
parametric model which satisfies ([7D, we construct the SPSE. If we can choose a good parametric 
model and a good 0, the SPSE will have better behavior than the NPSE. 



8 



Yoshida and Naito 



Remark 5 When we construct the semiparametric regression spline estimator (SPSE with 
X n = 0), we obtain b\(x\P ,'y) = 0. Therefore, C an x depends only on L a (x, r f). 

Remark 6 We see that the bias term b a (x\/3 ,j) appears due to the use of the 5-spline 
model. On the other hand, bx(x\(3 ,'y) arises from the penalty component. If we use the 
regression spline, b\(x\(3Q,j) vanishes and the bias of the estimator becomes less than that of 
the penalized spline estimator. However, the regression spline often provides overfitting. Thus, 
we use the penalized method for obtaining a smooth curve. If A n > 0, a certain amount of 
smoothness in the estimator is assured. However, b\(x\(3 , 7) may grow too large because of the 
influence of the parametric model. Therefore under A n > 0, we suggest choosing f(x\f3) such 
that b\(x\(3 ,j) becomes less than b\(x). Hence, together with L a (x, 7), the parametric model 
chosen by C ar ,x appears to bring fitness and smoothness to the SPSE. 



5 Simulation 

In this section, we examine the results of a numerical study to confirm the effects of the SPSE 
on a finite sample. We choose a parametric model by the criteria discussed in Section 4. We 
also compare the performance of the SPSE to those of the NPSE, the SLLE and the fully 
nonparametric local linear estimator (NLLE). In all situations, we utilize the linear and cubic 
splines and the second difference penalty for the second step nonparametric estimation. The 
SPSEs with linear and cubic splines are designated as SPSE1 and SPSE3, respectively. NPSE1 
and NPSE3 are labeled similarly. The number of knots and the smoothing parameter are 
determined by GCV. The design points {xi}™ are drawn from a uniform density on [0, 1] and 
the errors {£i}™ are generated from the normal with mean and variance <r 2 (xj). Let 

C a = C a (f(-\(3)) = #{ Zj e (0,1)1^(^,7) > o,j = 1... ,j}, 
Cx = C x (f(-\(3)) = #{ Zj e (0,1)1^(^,7) > 0,j = I,--- ,j}, 
C a nx = C anX (f(-\(3)) = #{ Zj E (0,1) I L^, 7) > 0, L A (z„ 7) > 0, j = 1, • • • , j} , 

where zj = j/J, J = 100. We prepare a class of candidate parametric models {/& = fk(-\(3 k )\k = 
1, • • • , K}. For each fy, we calculate C a , Cx and C anX - We use a number of repetitions R = 
1000. For each iteration, we pick up from candidate models which maximize C a . The same 
manipulation is implemented for Cx and C a nA- Finally we count the number of times that fj. is 
picked up during the iterations. For comparison, we also show the model selection by using the 
AIC and the Takeuchi information criterion (TIC) detailed in Konishi and Kitagawa (2008). 
Let 



3 R 




^ E AC*) " /(**), V, = i J2\ ?r(*i) ~ ^ E /'< 
r=\ r=l I. r=l 

where f r (zj) is the estimator for the rth repetition. Let ISB = 100" 1 ^=1 B?, V = 100" 1 Ylf=i Vj 
and MISE = ISB + V be the estimates of integrated squared bias, integrated variance and mean 
integrated squared error of /, respectively. For comparison, the ISB, V and MISE of the SLLE 
and the NLLE were also calculated. In the SLLE and the NLLE, we used the Gaussian kernel 
and its bandwidth h n was obtained by the direct plug- in approach (Ruppert et al. (1995)). 
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Table 1: The results of parametric model selection in Example 1. 



n = 


25 


SPSEl 


SPSE3 


IC 


method 


model 


C a 


Cx 


C a n\ 


C a 


C x 


C a nx 


AIC TIC 




sin 


1000 


901 


1000 


1000 


1000 


1000 


850 938 


7 = 


polyl 

























polyS 





99 














150 62 




sin 


997 


917 


974 


997 


837 


953 


850 938 


7=1 


polyl 





34 


3 





77 


4 







polyS 


1 


33 


20 


3 


86 


43 


150 62 



Example 1 The true function is f(x) = 2 + sin(27rx). We use three different specified 
parametric models: 

{/?o + Pi sin(27rx), fx = sin, 

Po + Pix, h = polyl, 

P + (3 1 x + ft + f3 3 x 3 , h = poly 3 . 

The true curve can be approximated by sin. The curve polyl is a rough model and polyS is close 
to the true /. The variance of the error is o~ 2 (x) = (0.5) 2 and the sample size is n = 25. The 
coefficients of the covariate are estimated by the maximum likelihood method for each model. 
This set-up is similar to that used by Glad (1998). 

Table 1 includes the number of times that each parametric model fk was chosen based on 
each criterion. In C a , Cx and C a nA 5 sin was selected in almost all iterations. This result is 
desirable because sin coincides with the true function /. We also observe that the AIC and the 
TIC often choose sin. When the number of times sin is chosen is taken into consideration, it 
seems that C an x is a better selector than the AIC and the TIC. 

Results for ISB, V and MISE of the SPSE and the NPSE are given in Table 2. The SPSE 
with sin succeeds in regards to bias reduction even with a small sample size, and variance and 
MISE of the SPSE are also smaller than those of the NPSE. In additive correction, the result 
of SPSEl with polyl is exactly the same as that of the NPSE (see Corollary 1). If we use polyS, 
MISE of the SPSE is smaller than that of the NPSE, although the squared bias is somewhat 
larger in multiplicative correction. In both ISB, V and MISE, the values of the SPSE are smaller 
than those of the SLLE. We implemented the same method of analysis for the case n = 200. 
The ISB, V and MISE of the SPSE and those of the NPSE were almost the same, although 
these are not shown in this paper. 

Example 2 The same true function / used in Example 1 is adopted and the sample size is 
n = 25. A class of initial parametric models is chosen, consisting of qth degree polynomials 
ranging from q = 1 to 6 and designated as polyl, poly 6, respectively, and a 2 = 1. This 
parametric model clearly does not contain the true / and the estimator becomes unstable because 
the variance of error is relatively large. 

In Table 3, we tabulate the number of times out of a 1000 repetitions that each polynomial 
model is selected based on bias reduction and information criteria. In multiplicative correction, 
poly3 was selected by C a , Cx and C an x most often. In additive correction of SPSEl, polyS 
was selected by C a most often. On the other hand, in SPSE3, C a and C a nx selected poly5. 
Finally, AIC and TIC most often selected polyS and poly5. It appears that our criteria and the 
information criteria tend to choose the same model. 
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Table 2: Results of integrated squared bias, variance and mean integrated squared bias of 
Example 1. All entries for ISB,V and MISE are 10 3 times their actual values. 



n = 25 


SPSE1 


SPSE3 


SLLE 


method 


model 


ISB V MISE 


ISB V MISE 


ISB V MISE 


7 = 


sin 
polyl 
poly3 


0.009 8.308 8.318 
1.450 12.111 13.562 
1.250 10.949 12.199 


0.009 7.907 7.917 
1.110 10.056 11.166 
0.873 9.636 10.510 


0.029 9.032 9.061 
2.370 14.105 16.476 
2.071 15.825 17.898 


7 = 1 


sin 
polyl 
polyS 


0.011 8.394 8.405 
1.571 12.322 13.893 
2.016 11.198 13.215 


0.010 8.292 8.302 
1.565 12.212 13.777 
1.016 10.198 11.215 


0.026 10.708 10.734 
2.357 13.860 16.217 
2.942 12.472 15.415 


n = 25 


NPSE1 


NPSE3 


NLLE 


Fully nonparametric 
method 


ISB V MISE 


ISB V MISE 


ISB V MISE 


1.450 12.111 13.562 


1.108 11.030 12.138 


2.370 14.105 16.476 



Table 3: The results of parametric model selection in Example 2. 



n = 


25 


SPSE1 


SPSE3 


IC 


method 


model 


C a 


C x 


C a nx 


C a 


Cx 


C a nx 


AIC 


TIC 




polyl 




























poly2 





49 


30 





8 











7 = 


polyS 


956 


472 


511 





939 





415 


693 




polyJ^ 


6 


43 


6 


5 


2 


15 


116 


8 




poly5 


6 


356 


312 


967 


37 


982 


306 


298 




poly6 





3 


85 


20 


1 


3 


163 


1 




polyl 


2 


43 


37 


2 


35 


49 










poly2 


13 


4 


6 


173 


44 


46 








7 = 1 


polyS 


755 


376 


410 


756 


606 


514 


415 


693 




polyl 





15 


71 








1 


116 


8 




poly5 


169 


366 


246 


10 


166 


213 


306 


298 




poly6 


3 


119 


135 


1 


35 


49 


163 


1 
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Table 4: Results of integrated squared bias, variance and mean integrated squared error for 
Example 2. All entries for ISB,V and MISE are 10 3 times their actual values. 



n 


= 25 


SPSE1 


SPSE3 


SLLE 


method. 


model 


ISB 


v 


MISE 


ISB 


v 


MISE 


ISB 


v 


MISE 




no] n 1 


1.213 


232 429 


233 643 


1.417 


256 275 


257 692 


1.991 


246 245 


948 236 




noliiP, 


0.846 


226.256 


227.103 


0.695 


239.949 


240.645 


2.836 


236.124 


238.960 


7 = 


poly3 


0.776 


225.508 


226.285 


0.729 


210.204 


210.933 


1.157 


243.466 


244.623 




poly4 


1.322 


251.572 


252.894 


1.476 


236.314 


237.791 


2.626 


229.014 


231.640 




•poly 5 


0.161 


251.777 


251.938 


0.122 


238.596 


238.717 


0.128 


277.704 


277.832 




poly 6 


0.162 


236.066 


236.227 


0.134 


233.793 


233.927 


0.119 


235.824 


235.943 




polyl 


1.665 


230.226 


231.891 


1.746 


253.074 


254.820 


2.109 


254.547 


256.657 




poly 2 


0.534 


268.503 


269.037 


0.321 


225.818 


226.138 


2.871 


256.551 


259.421 


7 = 1 


polyS 


0.323 


213.758 


214.081 


0.519 


214.566 


215.086 


1.545 


237.094 


238.638 




poly4 


0.924 


233.528 


234.452 


0.735 


245.211 


245.956 


2.858 


259.805 


262.662 




poly 5 


0.390 


218.850 


219.240 


0.624 


221.162 


221.786 


0.733 


243.170 


243.903 




poly 6 


0.356 


241.451 


241.807 


0.678 


241.242 


241.920 


0.895 


240.767 


241.662 


n 


= 25 


NPSE1 


NPSE3 


NLLE 


Fully nonparametric 


ISB 


V 


MISE 


ISB 


V 


MISE 


ISB 


V 


MISE 


method 


1.213 


232.429 


233.643 


1.629 


249.219 


250.848 


1.991 


246.245 


248.236 



The ISB, V and MISE of the estimators are shown in Table 4. In additive correction, poly 5 
has the smallest ISB. We note that C an \ chooses poly5 in SPSE3. In both corrections, poly3 
has the smallest V and MISE in all models. On the whole, the SPSE displays better behavior 
than the SLLE although there are some exceptions. 

Example 3 The set-up of the true function and parametric models are the same as in Example 
2, but the sample size is set to n = 75. We utilize the error variance defined as o~ 2 (x) = 
(x — 0.5) 2 + 0.1. However the parametric estimator is composed by the ordinary least squares 
method. 

In Table 5, the results of the parametric model selection are shown. In additive correction 
of SPSE1, C an \ indicates that the best model is poly 5 although C a selects polyS every time. In 
multiplicative correction, polyS is selected by C a many times while C\ and C ar] \ select poly5. 
From the definition of C an \, it is understood that poly 5 is selected in a fitness and smoothness 
context. On the other hand, AIC and TIC choose poly 3 and poly 5, respectively. We note that 
the use of AIC might not be appropriate in this situation since the prepared model does not 
include the true / and, hence, we place more confidence in TIC. On the other hand, when we 
select the parametric model only by the maximum of the log-likelihood, poly5 was chosen 1000 
times. Therefore, it seems that the bias correction in AIC is too strong in this situation. 

In Table 6, the ISB, V and MISE of the SPSE are tabulated. In both corrections, the SPSE 
with poly5 and poly6 have overwhelmingly small ISBs compared with those of polyl-poly4- As 
C a and C\ focus on bias reduction, it appears that C an \ chooses poly5 because it often has a 
small bias. On the other hand, polyS has good V and MISE, while poly5 does not. For ISB, V 
and MISE, the values of the SPSE is smaller than those of the SLLE, respectively. 

Example 4 The true model is f(x) = 4 + e _x {sin(77rx) + 2 cos(37rx)} and the error variance 
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Table 5: The results of parametric model selection in Example 3. 



77, = 


75 


SPSE1 


SPSE3 


IC 


tvi pf n n H 


Tnorlpl 


n 






n 




WnA 


A If 1 


TIP 




Tin II I 7 

puiyi 


o 


o 


o 


o 





o 





o 




poly 2 





5 








65 











7 = 


poly3 


1000 


47 


8 





142 





457 







poly4 





2 


172 





12 


21 


94 


2 




poly5 





604 


630 


945 


624 


872 


296 


950 




poly6 





277 


113 


17 


66 


68 


153 


48 




polyl 


8 


2 


8 


8 


51 


62 










poly2 


64 


222 


168 


62 


150 


118 








7 = 1 


polyS 


894 


17 


86 


890 


101 


104 


457 







poly4 





72 


104 





20 


31 


94 


2 




poly5 





363 


398 


5 


295 


333 


296 


950 




poly6 





182 


85 


3 


253 


214 


153 


48 



Table 6: Results of integrated squared bias, variance and mean integrated squared bias of 
Example 3. All entries for ISB, V and MISE are 10 3 times their actual values. 



n 


= 75 


SPSE1 


SPSE3 


SLLE 


method 


model 


ISB 


V 


MISE 


ISB 


V 


MISE 


ISB 


V 


MISE 




polyl 


0.061 


1.330 


1.390 


0.065 


1.237 


1.302 


0.645 


6.529 


7.175 




poly2 


0.017 


1.326 


1.343 


0.007 


1.231 


1.238 


0.734 


6.298 


7.032 


7 = 


poly3 


0.017 


1.325 


1.343 


0.007 


1.230 


1.237 


0.249 


6.292 


6.541 




poly4 


0.062 


1.343 


1.405 


0.066 


1.251 


1.317 


0.608 


6.732 


7.340 




poly5 


0.003 


1.377 


1.380 


0.002 


1.285 


1.287 


0.017 


4.863 


4.880 




poly 6 


0.004 


1.435 


1.440 


0.002 


1.350 


1.354 


0.019 


5.552 


5.571 




polyl 


0.062 


1.337 


1.399 


0.068 


1.246 


1.314 


1.084 


6.167 


7.251 




poly2 


0.024 


1.328 


1.352 


0.021 


1.235 


1.256 


0.997 


6.186 


7.183 


7=1 


poly3 


0.030 


1.325 


1.342 


0.014 


1.233 


1.248 


0.314 


6.279 


6.593 




poly4 


0.072 


1.348 


1.419 


0.078 


1.258 


1.336 


0.420 


6.476 


6.896 




poly5 


0.003 


1.380 


1.383 


0.002 


1.290 


1.292 


0.023 


4.925 


4.949 




poly 6 


0.003 


1.438 


1.441 


0.002 


1.353 


1.355 


0.025 


5.528 


5.553 


n 


= 75 


NPSE1 


NPSE3 


NLLE 


Fully nonparametric 


ISB 


V 


MISE 


ISB 


V 


MISE 


ISB 


V 


MISE 


method 


0.061 


1.330 


1.390 


0.065 


1.237 


1.302 


0.645 


6.529 


7.175 
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Table 7: The results of parametric model selection in Example 4. 



Tl = 


50 


SPSE1 


SPSE3 


IC 


in nnn 








C a nA 


'-'a 


^A 


CanA 


ATC 


TTC 




S1TI 


QQ7 

yy / 


QQ8 


QQ9 
yyz 




yyo 


Q79 

y / l 


1 

1 


n 
u 


7 — u 




3 


2 


o 


7 


4 


17 


602 


11 




volul 











1 
















poly4 








2 











397 


902 




poly8 








3 














87 




sin 


887 


823 


686 


887 


821 


791 


1 





7 = 1 


cos 


77 


17 


1 


77 


114 


43 


602 


11 




polyl 





11 


37 



















poly4 





56 


109 





23 


93 


397 


902 




poly8 





47 


88 





14 


28 





87 



is o~ 2 (x) = 0.5. The parametric model is 

Po + e~ x {f3\ + 02 sin(77rx) + P% cos(37rx)}, f\ = sincos, 

Po + e~ x {Pi + sin(77rx)}, f 2 = sin, 

t( \ft\ = ) A + e _:c {/3i + /3 2 cos(3vrx)}, f 3 = C os, 

J{ lP) ) Po + e-^Pt+fcx}, h = polyl, 

Po + e~ x {Pi + p 2 x + --- + p 5 x A }, h = poly4 , 

, Po + e~ x {pi + p 2 x + --- + /3 9 x 8 }, f 6 = poly8 

The function sincos corresponds to the true function. 

In Table 7, the results of the parametric model selection are tabulated. The sincos, cor- 
responding to the true /, was not included in the model selection since it should be chosen 
frequently. In both corrections, 7 = 0, 1, sin was chosen by C a , C\ and C an \ most often. On 
the other hand, TIC selected poly4, and AIC selected cos and poly4 quit often. 

In Table 8, the ISB, V and MISE of the estimators are shown. In both corrections, 7 = 0, 1, 
the behavior of the SPSE with sin is superior than that of the SPSE with any other model 
except sincos. We observe that the SPSE with the initial parametric model selected by C a n\ 
shows better behavior than that with the model selected by information criteria. 

Furthermore it can be seen that ISB, V and MISE of the SLLE with sincos are significantly 
smaller than those of the SPSE with any parametric model. On the other hand, if we use incor- 
rect models (other than sincos) in the SLLE, then the ISB, V and MISE of the SLLE are larger 
than those of the SPSE. 



Remark 7 In all examples, we also compared the behavior of the SPSE and the SLLE under 
the conditions that K n is equal to the ceiling of h~ l and that A n = n p /n 2p+1 . From these 
results, we have confirmed that the ISB of the SPSE is smaller than that of the SLLE for each 
parametric model. In contrast, the V and MISE of the SPSE are larger than those of the SLLE. 
Thus, it seems that the SPSE produces overfitting. 



6 Discussion 



We have discussed the SPSE using a parametric model. We see that the SPSE has better 
behavior than the NPSE, provided we can choose a good f(x\(3) in the first parametric step. 
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Table 8: Results of integrated squared bias, variance and mean integrated squared error for 
Example 4. All entries for ISB,V and MISE are 10 3 times their actual values. 



n 


= 50 


SPSE1 


SPSE3 


SLLE 


method 


model 


ISB 


v 


MISE 


ISB 


v 


MISE 


ISB 


v 


MISE 




sincos 


0.051 


87.361 


87.412 


0.041 


81.564 


81.605 


0.025 


64.752 


64.777 




sin 


2.689 


86.891 


89.580 


3.270 


81.053 


84.323 


15.416 


85.149 


100.566 


7 = 


cos 


17.206 


87.095 


104.302 


13.195 


86.217 


99.411 


21.039 


92.615 


113.654 




polyl 


19.095 


89.314 


108.409 


13.950 


88.674 


102.624 


25.920 


104.183 


130.103 




poly4 


15.990 


91.930 


107.920 


11.733 


90.234 


101.967 


25.716 


106.923 


132.639 




poly8 


16.492 


94.013 


110.505 


11.896 


92.078 


103.975 


22.992 


108.436 


131.428 




sincos 


0.051 


88.492 


88.543 


0.040 


82.978 


83.018 


0.025 


63.735 


63.761 




sin 


4.968 


87.858 


92.825 


6.245 


82.485 


88.730 


18.049 


83.225 


101.274 


7 = 1 


cos 


17.269 


89.904 


107.174 


12.525 


89.165 


101.690 


20.751 


92.491 


113.242 




polyl 


18.981 


90.991 


109.972 


13.360 


90.053 


103.413 


28.430 


94.194 


122.624 




poly4 


15.451 


94.073 


109.524 


11.155 


92.079 


103.233 


24.959 


106.714 


131.673 




poly8 


15.534 


95.991 


111.525 


10.936 


93.630 


104.566 


26.838 


106.554 


133.392 


n 


= 50 


NPSE1 


NPSE3 


NLLE 


Fully nonparametric 


ISB 


V 


MISE 


ISB 


V 


MISE 


ISB 


V 


MISE 


method 


18.884 


88.770 


107.653 


13.878 


88.201 


102.079 


26.859 


93.344 


120.204 



A similar conclusion can be drawn for the semiparametric regression spline estimator by letting 
A n = 0. 

In the field of kernel smoothing, Fan et al. (2009) noted that the semiparametric local 
polynomial estimator can also be constructed in the additive model (Hastie and Tibshirani 
(1990)). The reason for this is the asymptotic result of nonparametric kernel regression in 
the additive model, which has previously been developed by Ruppert and Opsomer (1997) and 
Opsomer (2000). On the other hand, it appears that the asymptotic results for the penalized 
spline estimator have still not been sufficiently investigated in comparison to kernel smoothing. 
While it is beyond the scope of this paper, this semiparametric approach with a penalized spline 
can be also extended to the generalized linear model. In this sense, there are still many topics 
that should be examined in theoretical studies of the penalized spline method. 

Appendix 

For a matrix A n = (a,ij n )ij, if max{n a |ajj jra |} = Op(l)(op(l)), then it is written as a n = 

O p (n~ a 11' ')(op(n~ a 11'))- When A n is vector, define A n = Op(n~ a l)(op(n~ a l)) like a matrix 
case. This notation will be used for matrices with fixed sizes and sizes depending on n. For 
the proofs of Proposition 1, Theorems 1-2 and Corollary 1, we define A n = n~ 1 A. We need 
additional lemmas as follows. 

Lemma 1. Let A = (a>ij)ij be {K n + p) matrix. Assume that K n — > oo as n — > oo, A = 
Op(K%11'). Then AA" 1 = 0{K}+ a lV) 
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Lemma 2. Let g : R — > R &e any function with sup{<7(x)} < oo. Then, Bi(u)g(u)du 
0(K~ r ) and Jq 1 Bi(u)Bj(u)g(u)du = 0{K~ l ). 



Lemmas Q] and [2] are shown by fundamental properties of 5-spline(see, Claeskens et al. (2009) 
and Zhou et al. (1998)). 

Proof of Proposition^ First we calculate the asymptotic expectation of f 7 (x,/3 ): 
E[f^x,/3 )\X n ] = fWPtfBlxyk^Z'Elr^Xn], 

where 

E[r 7 \X n ] 



H{ Xl )-f{ Xl \^) f(x n )-f(x n \(3 ) V 

V f(xi\P P f(Xn\(3 P J 



By using Theorem 2 (a) of Claeskens et al. (2009), if {/(x) — /(x|/3 )}//(x|/3 ) 7 is regarded as 
regression function, we have 

E[r,(x, /3 )\X n ] = f(X l~/ f } X !f° ) + b a i(x\(3 , 7) + b xl (x\P 0) <y) + op(^" (p+1) ) + op(A„K„n" 1 ), 

where 6ai(x|/3 ,7) = — (^n/n)B(xyG(q)~ 1 Q m b*(f3 ,^). Therefore, the expectation of /o(x,7) 
can be written as 

E[f (x,^\X n ] = /(x|/3 ) + /(x|/3 )^[r 7 (x,/3 )|X n ] 

= f(x) + f(x\(3 Q y{b al (x\(3 Q , 7 )+b xl (x\(3 , 7 )} 

+o P (K-(r+V) + 0p ( An i^ n n- 1 ) 
= /Or) + 6 a (x|/3,7) + &a(x|/3,7) + °p(^C + op(A„^„n" 1 ). 

Next we show the asymptotic variance of /o(x, 7). It is easy to see that 



y[/ (x, 7 )|X n ] = f{x\P)^B(x)'A- 1 Z'V[r 1 \X n ]ZA7 1 B{x) 



f(x\f3) 2 -< 



B{x)'K l Z' ( diag 



C7 2 (xi) 0- 2 ( Xn ) 



/(xi|/3)27' '/(x„|/3)27 



Zi\- l B{x). 



The (i, j)-component of n Z'V[r^\X n ]Z can be calculated as 



Z' ( diag 



1? 



C7 2 (xi) 0- 2 (x n ) 



/(xxl/3) 2 ' '/(s„|/3) 



CT 2 (Xfc) 



Z 



1,1 



n 

Hence, we obtain 

y[/o(x, 7 )|X„] = /(g| ^ )27 B( a; ) / G( g )- 1 G((7, 7, q)G(qr 1 B(x) + op^n" 1 ). 
n 

□ 
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Before proof of Theorem[TJ we define some symbols. For any function g(-\P) which is smooth 
for (3, 



1} (-|/3o) 



dg(-\P) 



dp 



/3=/V 9 [lPo) ~ dpdp' 



We use Taylor expansion of g(-\P) around /3 , giving 



g(-\P) = g(.\{3 ) + 9 U(W0 - P ) + \{P ~ p )'gW(-\P )(P - P ) + o P {n- 1 ). 
Proof of Theorem [IJ We first note from ([2]) that the SPSE is expressed as 



(8) 



/>, 7 ) =f{x\p) + B{x)'A- 1 Z'r 1 {p), 



where 



rJp) = (rJ yi \p) ••• rJy n \P))' 



(9) 



and r^yilP) = f(x\PV{ yi - f { Xi \P)} / f { Xi \PV . 
Taylor expansion yields that 

/(x, 7 ) = /o(x,7) + PK^lYiP ~ Po) + \{P ~ P )'f i2) (x,^(p - p ) + opin- 1 ), 
where 

n 

p)( x , 1 ) = fW( x \P ) + ^2{B(x j yA~ 1 B(x)}r^(y j \P ) 

3=1 

and 

P\x, 7 ) = fW(x\P ) + {BixjYA^Bix)}^^). 

i=i 

First we derive the asymptotic expectation of /(a?, 7). The term E[fo(x,"f)\X n ] has already 
been derived in Proposition 1. Direct calculations with repeated use of and Lemmas 1 and 
2 yield that 



1 " r f d 

f^(x\P o y \l(x a ,Y a ) + - + 5 n 



a=l 



^E[f^(x\P Yd\X n ] + O(n- 2 ) 



and 



X r 



+ P (n- 2 ) 



1 r t 7 % 

-EE {B{x 3 )'K~ l B{x)}E rW(Xj\0 o y /(*«, F a ) + - + 5 n 

a=l j=l L I 

1 n r , , n 

= - £ (S(x J )'A" 1 J B(x)} £ rW(i5|/3o)' Jfo,^) + - 
n i i n j 

= Op(n~ 1 ). 
Hence we obtain 

E[p\ Xll y(p - P )\X n ] = Opin- 1 ). (10) 
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Analogously, 

E[(p-p o yp\x,~,)0-(3 o )\X n } = O P (n- 1 ) (11) 

can be also shown. (jlOp and (jlip are smaller order than the bias terms of /o(a;, 7). Therefore 
the bias of f(x, 7) is essentially dominated by the bias of fo(x,^f). 

Next we turn to the variance of f(x,^f). It follows from direct evaluation using @ that 

FLfW^'OS - P )\X n ] = Opin- 1 ). 

And simple but tedious calculations finally yield 

V[0 - P Q )'P\x,i){fl - (3 )\X n ] = P (n- 2 ). 

All terms of relating to covariance appeared from the right hand side of Q can be shown to be 
negligible order by Cauchy-Schwarz inequality. Hence the variance of f(x,j) is dominated by 
that of fo(x, 7). □ 

Proof of Theorem^ Let r(x,j) = B(x)' 'A" 1 Z'r~(l3). Then the semiparametric estimator can 
be written as f(x,j) = f(x\(3) + r(x,j). We now prove 

f(x, 7 )-E[f(x,j)\X n ] Aiv(M) (i2) 



^[/>, 7 )|X r 



0. 



by using Lyapunov theorem. First, from y/n(f(x\{3)— E[f(x\(3)\X n ]) = Op(l) and V[f(x, 7)! X r 
0(K n n~ l ), we have 

f(x\P)-E[f(x\P)\X n ] p^ 
y/v[f(x,7)\X n ] 
Therefore, (|12p can be obtained, provided that 

f(x,7) - E[f(x,j)\X n ] D 



iV(0,l) (13) 



because V[/(a;, 7)|X n ]/V[r(x, 7)|X n ] — > l(n — >■ 00). Furthermore, from the proof of Theorem 
[H we obtain 

f(x,7) - r (jc,7) P v 

— . — > 0, as n — )• 00 

and V[r(i, j)\X n ]/V[fo(x, 7)|X n ] — > l(n — > 00), where 

f (x,7) = B^'A-^'r^o) = /(x|/3 )^ f^fo/A- 1 ^)} fa ~ / [^ l ^ o)} • 

From now on, we try to show 

ro(z,7) - E[f (x,-/)\X n } D 



^V[r (x,l)\X n ] 



N(0,1) (14) 
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by applying the Lyapunov theorem. First we see that 

n 

f (x, 7 ) - E[f (x,j)\X n } = /(^^{iWA-^Or)} 

i=i 

And it is easily confirmed that 

f(x\f3 yB(x) l A- 1 B(x. i ) = OpiKnn- 1 ). 
By above evaluations and the moment condition for £j, we have 

E 



/(^l/3o) 7 ' 



J B[|/(x|/3 pB(x) / A- 1 B( a ; i )e i | 2+5 |^n] 



2+5 






x n 



|/(^|/3 )|7(2+<5) 



Op 



A' 



2+5 



2+5 



On the other hand, since B\ = V[fo(x, "y)\X n ] = Op{K n n l ), we have 



B 



2+8 



n / 



(2+5)/2 N 



Then it follows that 



1 ™ 
R 2+5 E 

J->n 



i=l 



f(x\(3 r{B( Xi yA- L B(x)} 

2+8\ 



Si 



2+S • 

^4 



/NA)) 7 

(2+5)/2\ 



2+5 






A* 



which tends to in probability by K n = o(n 1 / 2 ) and 5 > 2. This assures the Lyapunov 
condition, so that flU]) holds. Note that b a (x\/3 ,j) = 0(Kn ip+1) ), b x (x\(3 ,>y) = O^nKnU' 1 ) 
and V[f(x,^/)\X n ] = 0{K n n~ l ). It results from these evaluations and the assumptions for the 
order of K n and Ar, that 



E[f(x, 7 )\X n ] - f(x) - b a (x\f3 ,j) - b x (x\f3 , 7 ) 



0. 



V[f(x,j)\X n ] 



which completes the proof. 



□ 



Proof of CorollaryUl First, f q {x\(3 q ) can be expressed as the linear combination of the pth B- 
spline basis. From the fundamental property of B-spline basis (see, p. 95 of de Boor (2001)), 
actually, each x 3 can be written as 



r-P-J 



i, ,P, 



k=-p+l 
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where <pk,p{z) = ( K k — z) ■ ■ ■ (k^+p-i — z) and we have 

f q (x\(3 q ) = Pq + P lX + • • • + fi q X q 
V 

= P P -jX p i 

j=p-q 

= E ( E ft>- /" 1)J ri" j)! tf>)|^'w- as) 

fc=-p+l {j=p-q ) 

Note that (|15p consist for any (3 6 -B C The semiparametric penalized spline estimator 

is obtained by f(x,0) = f q (x\fi ) + fo(x,/3 q ). Let c = (c_ p+ i • • • c# n )' be the (K n + p) vector 
defined as 

4 = £ ^ /"^"^ gfo). fc = -p + v--,ir n 

i=p-9 

Then, we have f q {x\(3 q ) = B(x)'c and 

r o (x,0 q ) = B Wb = B(x)'(Z'Z + \ n Q m )~ X Z'{y - Zc). 

Therefore, we have 

f{x, 0) = f q (x\P q ) + f (x, 0,) = B(s)'c + B(i) 7 (Z'Z + \ n QmT X Z'{y - Zc). (16) 
When A n = 0, meaning that ro(x,(3 q ) is regression spline, (|16p can be written as 
/(x, 0) = £(x)'c + B{x)\Z'ZY x Z\y - Zc) = B(x)' \Z' Z)- 1 Z'y 

for all p > 1. So the semiparametric estimator and nonparametric estimator have the same 
form. If A n > 0, on the other hand, 

fix, 0) = Bix)' C - Bix)'iZ'Z + A n Q m ) _1 Z'Zc = \ n Bix)'iZ'Z + XnQm^QmC 

does not become unless Q m c = 0. However as far as we use (p, m) = (1,2) and equidistant 
knots, we obtain Q m c = 0. The square matrix Q2 of order (isT n + p) has the form Q2 = D' 2 D2, 
where (K n + p — 2) x (if n + p) matrix D2 is 

1-2 1 ■■■ 0" 
1-21'-.: 

••• 1 -2 1 

We way only prove D2C = 0. Because the feth component of c is 

j=o ^ 
we show that for j = 0, 1 and p = 1, 

2 = °> * = 1, • • • > K n +P- 

k=-p+l 



D2 — (dij)ij — 



20 



Yoshida and Naito 



By the definition of dik and (f)^\(z) = («& — z)^\ we have for j = 0, 

K n 

^ ^ifc<^i°i(0) = £^,t«t + ^i,j+l K 't+l + ^i,i+2^i+2 
fe=-p+l 

= 0. 

For j = 1, we obtain ^iti-p+i ^kfikifi) = 0- Therefore, -D2C = was proven. □ 
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